A New Framework to Deal with OOV Words in SLT System

نویسندگان

  • Yu Zhou
  • Feifei Zhai
  • Chengqing Zong
چکیده

Automatic spoken language translation (SLT) is considered as one of the most challenging tasks in modern computer science and technology. It is always a hard nut to deal with the problem of Out-Of-Vocabulary (OOV) words in SLT. The existing traditional SLT framework often doesn’t take effect for OOV words translation because of the data sparseness. In this paper based on the analysis of common OOV expressions appeared in SLT, we propose a new framework for bidirectional Chinese-English SLT in which a series of approaches to translating OOV expressions are presented. The experimental results have shown that our framework and approaches are effective and can greatly improve the translation performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Learning an Expert from Human Annotations in Statistical Machine Translation: the Case of Out-of-Vocabulary Words

We present a general method for incorporating an “expert” model into a Statistical Machine Translation (SMT) system, in order to improve its performance on a particular “area of expertise”, and apply this method to the specific task of finding adequate replacements for Out-of-Vocabulary (OOV) words. Candidate replacements are paraphrases and entailed phrases, obtained using monolingual resource...

متن کامل

Learning Out-of-Vocabulary Words in Automatic Speech Recognition

Out-of-vocabulary (OOV) words are unknown words that appear in the testing speech but not in the recognition vocabulary. They are usually important content words such as names and locations which contain information crucial to the success of many speech recognition tasks. However, most speech recognition systems are closed-vocabulary recognizers that only recognize words in a fixed finite vocab...

متن کامل

Multi Class-based n-gram Language Model for New Words Using Web Data

Out-of-vocabulary (OOV) words cause a serious problem for automatic speech recognition (ASR) system. Not only it will be miss-recognized as an in-vocabulary word with similar phonetics, but the error will also affect nearby words to make errors. Language models (LMs) for most of open vocabulary ASR systems treat OOV words as one entity, ignoring the linguistic information. In this paper we pres...

متن کامل

Monolingual Distributional Profiles for Word Substitution in Machine Translation

Out-of-vocabulary (OOV) words present a significant challenge for Machine Translation. For low-resource languages, limited training data further increases the frequency of OOV words and degrades the quality of the translations. Past approaches have suggested using stems or synonyms for OOV words. Unlike the previous methods, we propose handling not just the OOV words but rare words as well in a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011